View Validation: A Case Study for Wrapper Induction and Text Classification

نویسندگان

  • Ion Muslea
  • Steven Minton
  • Craig A. Knoblock
چکیده

Wrapper induction algorithms, which use labeled examples to learn extraction rules, are a crucial component of information agents that integrate semi-structured information sources. Multi-view wrapper induction algorithms reduce the amount of training data by exploiting several types of rules (i.e., views), each of which being sufficient to extract the relevant data. All multiview algorithms rely on the assumption that the views are sufficiently compatible for multi-view learning (i.e., most examples are labeled identically in all views). In practice, it is unclear whether or not two views are sufficiently compatible for solving a new, unseen learning task. In order to cope with this problem, we introduce a view validation algorithm: given a learning task, the algorithm predicts whether or not the views are sufficiently compatible for solving that particular task. We use information acquired while solving several exemplar learning tasks to train a classifier that discriminates between the tasks for which the views are sufficiently and insufficiently compatible for multi-view learning. For both wrapper induction and text classification, view validation requires only a modest amount of training data to make high accuracy predictions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive View Validation: A First Step Towards Automatic View Detection

Multi-view algorithms reduce the amount of required training data by partitioning the domain features into separate subsets or views that are sufficient to learn the target concept. Such algorithms rely on the assumption that the views are sufficiently compatible for multi-view learning (i.e., most examples are labeled identically in all views). In practice, it is unclear whether or not two vie...

متن کامل

Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets

Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...

متن کامل

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

Active Learning with Strong and Weak Views: A Case Study on Wrapper Induction

Multi-view learners reduce the need for labeled data by exploiting disjoint sub-sets of features (views), each of which is sufficient for learning. Such algorithms assume that each view is a strong view (i.e., perfect learning is possible in each view). We extend the multi-view framework by introducing a novel algorithm, Aggressive Co-Testing, that exploits both strong and weak views; in a weak...

متن کامل

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002